Genetic diversity of SARS-CoV-2 infections in Ghana from 2020-2021

The COVID-19 pandemic is one of the fastest evolving pandemics in recent history. As such, the SARS-CoV-2 viral evolution needs to be continuously tracked. This study sequenced 1123 SARS-CoV-2 genomes from patient isolates (121 from arriving travellers and 1002 from communities) to track the molecular evolution and spatio-temporal dynamics of the SARS-CoV-2 variants in Ghana. The data show that initial local transmission was dominated by B.1.1 lineage, but the second wave was overwhelmingly driven by the Alpha variant. Subsequently, an unheralded variant under monitoring, B.1.1.318, dominated transmission from April to June 2021 before being displaced by Delta variants, which were introduced into community transmission in May 2021. Mutational analysis indicated that variants that took hold in Ghana harboured transmission enhancing and immune escape spike substitutions. The observed rapid viral evolution demonstrates the potential for emergence of novel variants with greater mutational fitness as observed in other parts of the world.

A year after the World Health Organization (WHO) declared the coronavirus disease 2019  caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) a pandemic, over 190 million confirmed cases and 4 million deaths have been reported worldwide 1

. As of 27th
September 2021, Ghana's cumulative COVID-19 cases stood at 127,482 with an active case count of 3088 and 1156 deaths 2 . Through the COVAX initiative, 1.23 million people in Ghana have received vaccine doses, with 376,000 fully vaccinated with the AstraZeneca vaccine 3,4 Compliance to prescribed preventive measures have been low to average within the communities, and there is evidence of persistent albeit mostly asymptomatic infections, especially in Accra and other major cities in Ghana 5 .
COVID-19 control measures in Ghana have evolved with the global pandemic. Ghana's international airports and all land borders were closed to international travel on 22nd March 2020, followed by a partial lockdown of two major cities from March 30th to 22nd April 2020. This was coupled with enhanced testing and contact tracing to track community spread. The airport was reopened to international travel on 1st September 2020, with twofold containment measures; (1) travellers must show proof of a negative COVID-19 test (taken at most 72 h before arrival) and (2) travellers must be negative for the SARS-CoV-2 antigen test upon arrival at the Kotoka International Airport (KIA) 2 . The guidelines for travellers who test positive at the airport have changed over time. Initially, travellers who tested positive upon arrival had to undergo mandatory isolation for at least 14 days (at travellers' cost) and were only allowed to go after a negative PCR/ antigen test. The guidelines were later relaxed to allow self-isolation, but mandatory isolation has been reinstated due to poor compliance 2 . Currently, a positive test leads to a minimum of 3 days in isolation. After 3 days, travellers who test negative by RT-PCR after 3 days are allowed to leave isolation. Since January 2021, all positive samples from travellers have been made available for genomic sequencing.
Like other RNA viruses, most mutations in the SARS-CoV-2 genome arise during viral replication, and the resulting mutant viruses are then subjected to selective pressures within the host and/or during inter-person transmission. Whole-genome sequencing (WGS) is critical in tracking viral genomic changes and may help to understand phenotypic changes. According to the WHO, several variants of interest (VOIs) have been shown to harbour amino acid changes associated with enhanced community transmission or multiple COVID-19 cases/clusters in numerous countries 6,7 . Other VOIs have proven to be variants of concern (VOCs) due to their increased transmissibility, virulence, and disease severity, or decreased susceptibility to public health measures, available diagnostics, vaccines, and therapeutics 7 . These may be demonstrated by increased receptor binding, reduced virus neutralisation by antibodies generated against previous infection or vaccination, loss or reduced diagnostic detection, or increased replication 7 . The SARS-CoV-2 VOCs and VOIs listed by WHO include; Alpha (B.  6,7 . Only Beta and Eta were first reported in South Africa and Nigeria, respectively 6,8 .
Having established a local capacity for sequencing and analysing SARS-CoV-2 genomes in Ghana 9 , molecular surveillance has continued using samples provided by the COVID-19 testing laboratories across the country. In addition, some samples from international travellers who tested positive on arrival at the airport were also analysed to track the introduction of new variants into the country. Thus, this report provides a comprehensive analysis of the genetic diversity of SARS-CoV-2 viruses that caused infections in the communities in Ghana from March 2020 to September 2021.
In 2021, there was a marked shift in the circulating variants and occurrence of regional specific outbreaks, with Eta dominating in Northern and middle belt regions, while B.1.1.318 dominated the major cities. The highest frequencies of Eta variants were observed in the Northern (24%, 13/55), Bono East (31%, 4/13) and Eastern (13%, 3/23) regions (Fig. 1d). The city of Tamale in the Northern region is the gateway, and central trading hub with Ghana's northern neighbours, whilst the Bono East region harbours major interaction routes with Ivory Coast in the western corridor of Ghana. Meanwhile, nearly a third of all the variants detected in 2021 were B.1.1.318 (22%, 176/802), and Greater Accra, where the capital city and the major international airport are located, had 80% (140/176) of all the B.1.1.318 genomes (Fig. 1d). These data suggest that Eta and B.1.1.318 variants, which dominated transmission in these areas in April-May 2021, could have been introduced through these major land borders. The B.1, B.1.1.359, B.1.1, and B.1.623 that dominated Ghana in 2020 became supplanted by Alpha and Delta VOCs in most of the regions. It is worth noting that in regions where more than 50 samples were sequenced in 2021, there was penetration or transmission of various VOCs, including the Central region, Bono East, Greater Accra, and Volta Region (Fig. 1d).

Importation of SARS-CoV-2 variants into Ghana by travellers.
One hundred and twenty-one of the sequenced samples (11%, 121/1123) were obtained from travellers identified as COVID-19 positive at the KIA. Of this number, Alpha accounted for 39% (n = 47) of the genomes while the other VOCs accounted for lower proportions; Beta (6%, n = 7), and Delta lineages (7%, n = 8) (Supplementary Table 4). The VOIs such as Eta (4%, n = 5), and the local variant under monitoring, B.1.1.318, 3% (n = 4) were detected at low proportions (Supplementary  Table 4). Importantly, the VOC Alpha was identified in travellers entering Ghana from all over the World, including other African countries, in January and March 2021 (Table 1). Furthermore, VOCs were detected in travellers from several of Ghana's neighbouring countries including, Nigeria, Ivory Coast, and Burkina Faso, demonstrating that these variants were already in those countries even though not reported or detected (Table 1). In most cases, VOCs and VOIs were identified amongst quarantined travellers before their detection within local samples. Travellers from Nigeria, Dubai, and the UK accounted for most detections of Alpha, Beta, Eta, and Delta variants (Table 1). Interestingly, the Beta and Kappa variants did not become dominant in Ghana; instead, B.1.1.318, which was detected in travellers from Nigeria, Gabon, and Dubai, became dominant in Ghana between April and June 2021.
Temporal trends of SARS-CoV-2 variant detection and frequency. Ghana was one of the last African countries to detect COVID-19 cases in March 2020, and the waves of COVID-19 in Ghana have lagged slightly behind other African countries and significantly behind the rest of the World (Fig. 2a). Previous work from our group described the viral genome dynamics between March and May 2020, when Ghana was largely closed to international travel (Ngoi et al. 9 ). Different variants rose to dominance at different times and during different infection waves across the country (Fig. 2a, b). Variants that cluster closely to B. Genetic diversity and evolutionary relationships of the SARS-CoV-2 variants. Amongst the many individual lineages represented in the data presented here, Delta lineages, Alpha, B.1.1.318, B.1.1.359, B.1.1, and Eta were the most evolved, with the highest genetic diversity (Fig. 3a). These variants exhibited a variation in the number of mutations from sample to sample, with Delta, Alpha and B.1.1.318 presenting a mean~30 (spread/ range of 20-45) mutations in the majority of the genomes (Fig. 3a). The Delta VOCs had the highest mean (~35) and presented an interquartile range of mutations from 25 to 45 mutations across all the samples (Fig. 3a). It is worth noting that this level of genetic diversity in Delta lineages was mainly attributed to the sublineages (n = 200/360); AY.39 (174/200) and AY.37 (15/200) lineages (Supplementary Table 5). Most of the other lineages with a small range of mutations were reported in 2020 and occurred spontaneously in very few samples hence the relatively low genetic diversity (Fig. 3a). The high level of genetic diversity in most VOCs, including the B.1.1318, is probably indicative of Ghana's local evolution and consequential adaptation compared to the other variants that did not gain prominence in the Ghanaian population (Fig. 3a).
A snapshot of the evolutionary relationship of these VOCs in Ghana shows a relationship of variants through space and time throughout the epidemic (Fig. 3b). Using a phylogenetic tree, we outline the phylogenetic relationships of VOCs and how they gained prominence coinciding with the COVID-19 waves in Ghana. The outbreak of the COVID-19 pandemic started in mid-November 2019, but then the tree shows that the earliest lineages in Ghana are dated March 2020, although most VOCs were introduced in 2021 (Fig. 3b). The phylogenetic analysis of the genomes from Ghana shows similarities to VOCs around the World, with all the VOCs having the same common ancestor (Wuhan). Still, as they diverge, they share uniquely more recent ancestors; for example, we show that the B.1.1.318 and Alpha variants share several recent ancestors (Fig. 3b). The root-to-tip divergence of the VOCs as a function of sampling time show a molecular clock of the various VOCs, and with strong evidence, the variants are evolving in a clocklike manner (R 2 = 0.71) (Fig. 3c). The variants in Ghana are gaining~26 mutations per year, and of particular interest is the B.1.1.318 that did not gain prominence worldwide, but its molecular clock is similar to most of the VOCs in Ghana (Fig. 3c). Mutational fitness of the B.1.1.318 lineage showed that ten samples had spike mutations that were likely to confer viral fitness (mutational fitness > 1) (Fig. 3d).
Mutational analysis of the amino acid substitutions. The most abundant substitution in all the samples was the spike D614G (97%, 972/1002), followed by ORF1b: P314L (91%, 915/1002) (Fig. 4a). For most of the genes, one or more amino acid substitutions occurred in more than 100 samples, although spike protein dominated the profile (Fig. 4a). Interestingly, some variants with different evolutionary lineages had similar amino acid substitutions, mainly spike glycoprotein. The Eta variant had the highest (three) individual amino acid substitutions (Q52R, Q677H and F888L) in the spike protein compared to other VOCs, probably contributing to its adaptability in Africa. Compared to other VOCs, the substitutions unique to the Alpha variant were S13I, R567K, A570D, and T716I. The only substitution unique to the B.1.1.318 on the spike protein was the D1127G compared to other VOCs and VOIs (Fig. 4b). Within these samples, the Delta lineages shared 14 substitutions (T19R, G142D, R158G, A222V, L452R, T478K, E484Q, D614G, S680F, P681R, D950N, K1191N, G1219V and C1253F) (Fig. 4b). The unique substitutions were fewer than shared amino acid substitutions among lineages (Fig. 4b), thus explaining the increased abundance of some substitutions among the VOCs. Those with the highest frequency in the spike protein among Delta lineages, B.1, B.1.1, B.1.1.318, Alpha, Beta and Eta were fitness substitutions D614G and P681R/ H (Fig. 4b). Alpha and B.1.1.318 had the P681H substitution while P681R was present in Delta lineages. Immune escape substitution E484K was present in B.1.1.318, Beta, and Eta, while Delta variants frequently presented with E484Q substitution.

Discussion
Having established local capacity to generate high-quality genome sequences and comprehensively analyse them in-house, we conducted genomic surveillance in Ghana from March 2020 to September 2021 and performed an in-depth analysis on the resulting 1123 SARS-CoV-2 sequences obtained. This study represents the most extensive genomic analysis of the SARS-CoV-2 viruses driving the COVID-19 pandemic in Ghana. Trends in SARS-CoV-2 infections in Ghana have followed a similar pattern as the rest of Africa and globally, although Ghana has constantly lagged behind the rest of Africa and the World in the COVID-19 disease during the second and third waves. The emergence of new variants such as Alpha and Delta have been responsible for the second and third COVID-19 disease waves in Ghana, as has been reported in other countries in Africa and globally 1 . The Ghana Health Service monitoring data indicate that the Greater Accra Region was and still is the epicentre of COVID-19 infections 2 . Other regions with major urban cities, including Ashanti, Western and Central, have had high infection levels and similar circulating variants as Greater Accra, likely due to a high volume of intercity travel across these regions. Regions like Northern and Upper East, further from Accra, tended to have different variants during the second wave. In the third wave, these regions still lag behind the rest of the country and do not seem to be undergoing a third wave yet 2 . These regions experience much lower international travellers from global COVID-19 hotspots than Greater Accra and Ashanti. Furthermore, they have a more sparse population with less congested cities than Greater Accra and Ashanti regions. Studies have shown that higher population density increases contact rates necessary for SARS-CoV-2 disease transmission 10 .
Before the airport reopened to international travellers in September 2020, the B.1.1 variant was dominant in Ghana and remained the most dominant circulating lineage throughout    14 .
Our dataset showed that Delta lineages dominated in Ghana from June 2021, and remained dominant as at September 2021. The worldwide dominance of Delta variants 15,16 has been linked to the P681H/R substitutions 3 as well as additional mutations in the viral RNA-dependent RNA-polymerase coding sequence. It is opined that these mutations enhance replication speed and significantly increase the number of cases 17 . As such, it was not surprising that Delta accounted for over 30% of all the cases of COVID-19 successfully sequenced in the current study. This is consistent with data from India, where Delta was first detected in clinical cases and was responsible for many COVID-19 case fatalities in that country 18,19 . Indeed, Delta lineages have been shown to increase COVID-19 virulence and poor prognosis in certain populations 20  Fatality Rates remain low to date, a further indication of the apparent resilience of the Ghanaian population to COVID-19 highlighted in our previous study 5 . Nevertheless, as a limitation to the study, the dominance of the VOC in our study may result from the purposive sampling and potential underrepresentation of some variants in other regions.
Through continuous genomic surveillance, we have characterised the diversity and evolution of COVID-19 variants in Ghana. We observed high variation in the number of mutations between samples, suggesting evolution and multiple independent emergences of the Ghanaian variants against the backdrop of a multi-ethnic society. Besides, high mutation frequency within a  population leads to higher chances of diverse variants. Although our study has the limitation of relying on purposive sampling and lack of clinical/epidemiological data such as symptoms and comorbidities, the large sample size and the depth of genetic analysis performed give high confidence that these data are robust and provide a credible overview of the evolution of the pandemic in Ghana. Enhancing and speeding up vaccinations should be a priority, as well as the pursuit of therapeutic options. Ongoing virus and virus-host interaction experiments, combined with enhanced studies on severe/critically ill patients, should also give more insights into the pathogenesis of SARS-CoV-2 in Ghana. Sample selection and processing. Samples confirmed as SARS-COV-2 positive by Real-Time PCR were selected for genome sequencing. Viral RNA from nasopharyngeal and oropharyngeal samples was extracted using the QIAmp Viral RNA extraction kit (Qiagen, Hilden, Germany). The extracted total RNA concentration was measured using Qubit TM RNA HS Assay Kit on a Qubit 4 Fluorometer (Thermo Fisher Scientific TM , MA USA). The integrity and quality of RNA were checked using the Agilent RNA 6000 Nano Kit on the Bioanalyzer (Agilent TM Tech. Inc. CA USA). The ARTIC LoCost protocol (https://artic.network/ncov-2019) was used for sequencing (Extended Methods) as follows; the extracted RNA was converted into cDNA using the LunaScript ® RT SuperMix kit (New England Biolabs, UK). The ARTIC V3 primer pools and Q5® Hot Start High-Fidelity DNA polymerase (New England Biolabs, UK) were used for multiplex tiled PCR to generate overlapping amplicons from the cDNA as per the protocol. Sequencing libraries were prepared by end preparation of the amplicons using the NEBNext Ultra II End Repair/dA-tailing module (New England Biolabs, UK) and afterwards barcoded using the EXP-NBD196 kit (Oxford Nanopore Technologies, UK) or the Blunt/TA Ligase Master Mix (New England Biolabs, UK). The barcoded amplicons were then pooled and purified using Ampure XP beads (Beckman Coulter). The purified barcoded library was quantified using the Qubit TM DNA HS Assay Kit (Thermo Fisher Scientific TM , USA) with about 75 ng of barcoded libraries ligated to the AMII sequencing adaptors (Oxford Nanopore Technologies, UK) using the Quick ligation kit (New England Biolabs, UK). The adaptor-ligated library was finally purified and quantified using Ampure XP beads (Beckman Coulter) and Qubit TM DNA HS Assay Kit (Thermo Fisher Scientific TM , USA) respectively. About 20 ng of the purified adaptor-ligated libraries were loaded on an R9.4.1 flow cell (FLO-MIN106). The sequencing was carried out using a MinION Mk1b or the Mk1c device (Oxford Nanopore Technologies, UK). Our previously published data from March to May 2020 were included in the study to help provide continuity to SARS-Cov-2 genomic epidemiology in Ghana 9 .

Methods
Generation of SARS-CoV-2 genomes. Base-calling and demultiplexing of MinION Fast5 files were performed using Guppy (from version 3.4.3-5.0.7) according to the ARTIC bioinformatic protocols 23 . Sequencing QC was asessed using pycoQC, and demultiplexed reads were aggregated and length filtered using ARTIC guppyplex for a minimum of 400 reads and a maximum of 700 reads to remove chimeric reads. Read QC was assessed using NanoPlot before read alignment, variant calling and consensus generations using ARTIC MinION software (ARTIC version 1.2.1). Alignment metrics, amplicon coverage analysis, variant annotation, and consensus assessment were performed using samtools, mosdepth, BCFtools, SnpEFF, and Quast according to the nfcore/viralrecon pipeline (version 2.2) 24,25 . Variant annotation, validation and quality assessment of the consensus sequence were performed using Viral Annotation DefineR (version 1.1.3) 26 .
Genomes that pass quality control were deposited on GISAID and ENA. Phylogenetic assignment of the consensus sequence to the globally named outbreak lineages was performed using Pangolin (Versions: pangolin-3.1.14, pangoLEARN 2021-10-13, Pango-designation-1.2.86) according to Rambaut,Holmes 27 . A coverage map was generated by comparing the Ghanaian SARS-CoV-2 genomes and the reference genome (Wuhan-Hu-1/2019) using Nextclade CLI (version 1.4.0). This web-based tool performs banded Smith-Waterman alignment with an affine gap-penalty 28 . Nextclade was also used to perform clade assignment and overall quality assessment of the genomes. Further analysis was executed in R (version 4.0.4). Contingency tables were constructed using summary statistics and a p value <0.05 was considered statistically significant.
Phylogenetic analysis. Phylogenetic analysis of the SARS-CoV-2 genomes was performed using the Nextstrain pipelines (v11) 28 . Briefly, the nextstrain pipeline incorporates various quality control processes, including; validation of the clinical metadata, aligning sequences using nextalign to identify gaps compared to the SARS-CoV-2 reference genome (MN908947.3 and LR757998.1), and performing pangolin to assign lineages labels 28 . Next, 100 bp from the start and 50 bp at the end were masked, and the regions prone to sequencing errors (13402, 24389 and 24390) 28 . An initial maximum likelihood phylogenetic tree was constructed using augur's fast and stochastic algorithm (IQTREE) 29 with a generalised timereversible substitution model. This tree was refined to estimate divergence, time, and node dates using a coalescent timescale, then exported to auspice and R (version 4.0.4) for visualisation. Molecular clock estimation for all the lineages and mutational fitness analysis for the B.1.1.318 lineage was also performed using nextstrain pipelines (v11). The mutational fitness was based on hierarchical Bayesian multinomial logistic regression 30 .
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.